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APPROXIMATING THE CONDITIONAL DENSITY GIVEN LARGE 
OBSERVED VALUES VIA A MULTIVARIATE EXTREMES 
FRAMEWORK, WITH APPLICATION 
TO ENVIRONMENTAL DATA 
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Phenomena such as air pollution levels are of greatest inter- 
est when observations are large, but standard prediction methods 
are not specifically designed for large observations. We propose a 
method, rooted in extreme value theory, which approximates the con- 
ditional distribution of an unobserved component of a random vector 
given large observed values. Specifically, for Z — {Zi, . . . , Zd)"^ and 
Z_d = {Zi, . . . , Zd-i)'^ , the method approximates the conditional dis- 
tribution of [Zd|Z_d — z^d] when ||z_tj|| > r,. The approach is based 
on the assumption that Z is a multivariate regularly varying random 
vector of dimension d. The conditional distribution approximation re- 
lies on knowledge of the angular measure of Z, which provides explicit 
structure for dependence in the distribution's tail. As the method pro- 
duces a predictive distribution rather than just a point predictor, one 
can answer any question posed about the quantity being predicted, 
and, in particular, one can assess how well the extreme behavior is 
represented. 

Using a fitted model for the angular measure, we apply our method 
to nitrogen dioxide measurements in metropolitan Washington DC. 
We obtain a predictive distribution for the air pollutant at a location 
given the air pollutant's measurements at four nearby locations and 
given that the norm of the vector of the observed measurements is 
large. 
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NO_2 Measurements 09/09/2002 




Fig. 1. Locations of the NO2 monitors used in the Washington DC study. Locations are 
Alexandria (alx), McMillan (mc), River Terrace (rt), Takoma School (ts) and Arlington 
(ari). Also shown is the boundary of the District of Columbia. Measurements for Septem- 
ber 9, 2002 are shown for the four locations we use for predicting the measurement at 
Arlington. 

1. Introduction and motivation. Nitrogen dioxide (NO2) is an air pollu- 
tant which is among those monitored by the US Environmental Protection 
Agency (EPA). Figure 1 shows NO2 measurements at four locations in the 
Washington DC metropolitan area on September 9, 2002. This day's mea- 
surements are particularly large: each of the four measurements exceeds the 
0.97 quantile of the empirical distribution for its location. Certainly, air pol- 
lution levels are of most interest when pollution levels are high. It is natural 
to ask, given the measurements at these four locations and given that they 
are large, what can be said about pollution levels at nearby unmonitored 
locations? 

Linear prediction methods are questionable when the data are non-Gaus- 
sian, and a better approach may be to approximate the conditional density. 
Extreme value theory leads one to describe the joint tail with non-Gaussian 
distributions, and dependence in the tail is typically not well described by 
covariances upon which linear prediction methods rely. In applications like 
the above air pollution example where interest lies in the large occurrences, 
approximating a conditional density allows one to answer important ques- 
tions such as "Given that nearby locations' measurements are large, what is 
the probability a certain unmonitored location exceeds some critical level?" 
or "Given that nearby locations' measurements are large, what is a reason- 
able probabilistic upper bound for the air pollution level at an unmonitored 
location?" 
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Our method to approximate the conditional density is based on extreme 
value theory and is therefore specifically designed for instances when the 
observations are large. Extreme value theory provides a framework for de- 
scribing the dependence found in the joint upper tail of the distribution, 
and, at the same time, does not require knowledge of the entire joint dis- 
tribution. In particular, we will assume that the joint distribution of the 
observations and the random variable we wish to predict are multivariate 
regularly varying, and we use the angular measure of this random vector 
to approximate the conditional density. By approximating the conditional 
density, we are able to address questions of the type posed above about the 
unobserved random variable. 

In the next section we provide some necessary background on extreme 
value theory. In Section 3 we discuss prediction for extremes; we review 
previous related work in Section 3.1 and then introduce our method in Sec- 
tion 3.2. In Section 4 we apply the prediction method to the Washington 
DC air pollution data. We conclude with a summary and discussion section. 

2. Characterizing extremes, multivariate regular variation and the angu- 
lar measure. Extreme value analysis is the branch of statistics and prob- 
ability theory whose aim is to describe the upper tail of a distribution. In 
this section we give a very brief overview of the discipline, particularly focus- 
ing on multivariate regular variation and the angular measure. There are a 
number of excellent resources if one wishes to delve more into the theory or 
practice of extreme value analysis. The book by de Haan and Ferreira (2006) 
gives a comprehensive overview of extreme value theory in the univariate, 
multivariate and stochastic process settings. Beirlant et al. (2004) also give 
a thorough treatment of the theory and give a broad overview of recent sta- 
tistical practice. Resnick (2007) focuses on the heavy-tailed case for both the 
univariate and multivariate settings. Coles (2001) gives an approachable in- 
troduction to statistical practice focusing primarily on maximum likelihood 
inference. 

2.1. Extreme value analysis. Extreme value analysis is founded on asymp- 
totic results that characterize a distribution's upper tail by a limited class 
of functions. Statistical practice fits this class of functions to a subset of 
data which are considered extreme. Two approaches for choosing this sub- 
set of extreme data are commonly used: in the first, block (e.g., annual) 
maxima are extracted, in the second, observations which exceed a threshold 
are retained. 

In the univariate case, asymptotic results lead one to model block maxi- 
mum data with a generalized extreme value (GEV) distribution which con- 
solidates the three extremal types [Fisher and Tippett (1928), Gnedenko 
(1943)] into one parametric family. Threshold exceedance data are generally 
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modeled via the generalized Pareto distribution (GPD) or an equivalent 
point process representation. 

Statistical modeling of multivariate extremes is more complicated. Given 
a sequence of i.i.d. random vectors Yj = . . . , Yi^^i)'^ , i = 1,2, . . . , classi- 
cal multivariate extreme value theory considers the vector of renormalized 
element- wise maxima M^zi£il where the division is taken to be element-wise, 

M„ = (Vr=i • • ■ ' Vr=i ^i,<i)^5 ^11"^ V denotes the maximum function. The 
theory shows the distribution of converges to a multivariate max- 

stable distribution (equivalently, multivariate extreme value distribution), 
which we characterize below. For threshold exceedance data, one must first 
define what it means for a random vector to exceed a threshold. Rootzen and 
Tajvidi (2006) define a multivariate GPD which is well-suited to describe 
threshold exceedances in which the threshold has been defined for each vec- 
tor element. For the work below, we employ the framework of multivariate 
regular variation [Resnick (2007)] to describe threshold exceedances. 

2.2. Multivariate regular variation. Multivariate regular variation is a 
notion that is used for modeling multivariate heavy-tailed data. This behav- 
ior is best seen via a natural decomposition into pseudo-polar coordinates. 
If nonnegative Z = (Zi, . . . , Z^)^ is a multivariate regularly varying random 
vector, then the radial component ||Z|| decays like a power function; that 
is, F(||Z|| > t) = L{t)t~°^, where Lit) is a slowly varying function^ at oo and 
q; > is termed the tail index. The angular component, ||Z||~^Z, is described 
by a probability measure that lives on the unit sphere and which becomes 
independent of the radial component as the radial component drifts off to 
infinity. Central ideas in the more detailed treatment that follows are: (1) 
the convergence to a measure A that serves as the intensity measure for a 
limiting point process, (2) that the limiting intensity measure is a product 
measure when described in terms of radial and angular components, and 
(3) the angular measure H which describes the distribution of the angular 
components. 

There are several equivalent definitions of multivariate regular variation 
of a random vector. We say the nonnegative random vector Z is regularly 
varying if 

(1) ^(^:^^A(.) 

^ ' P(I|Z|| >t) ^ ' 

as t — 7- oo, where v denotes vague convergence on C = [0, oo]'^ \ {0} and || • || 
is any norm^ [Resnick (2007), Chapter 3]. For any measureable set A C C 



^L{t) is slowly varying if limt_+oo -77^ = 1- Roughly, L{t) cannot go to zero or infinity 
faster than any power function. 

^The compact sets in C are all closed sets in [0, cxd]'', that do not contain 0, that is, the 
closed sets bounded away from 0. 
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and scalar s > 0, the measure A has the scaling property 

(2) A{sA) = s-"A(A) 

from which one sees the power-function-like behavior. Choosing a sequence 
o„ such IP(||Z|| > an) ~ n~^, one can obtain the sequential version of (1)^: 

(3) nF(—e)^A{-). 



The transformation to polar coordinates R= ||Z|| and W = ||Z||~^Z natu- 
rally arises from the scaling property (2). If Sd-i = {z gC \ ||z|| = 1} denotes 
the unit sphere under the chosen norm, then one can show there exists a 
probability measure H on S^-i such that for any //-continuity Borel subset 
B of Sd-i, 

(4) nP|— >r,WGS) — >r~'^H(B). 

\an ) 

Following Resnick (2007), we refer to H as the angular measure, although 
it is also referred to as the spectral measure. The advantage of the polar 
transformation is that the radial component R acts independently of the 
angular component W whose behavior is captured by K. 

From (4), one can characterize the tail behavior of Z if one knows (or can 
estimate) a and H . However, without further assumptions, this proves to be 
difficult, as K can be any probability measure on iS^-i. For simplification 
purposes, in multivariate extremes it is often assumed that the components 
Zj,j = l,...,d, of the random vector have a common marginal distribution, 
not just the common tail index that is implied by the general conditions of 
multivariate regular variation [e.g., Resnick (2002), Section 2]. There is no 
loss in generality by assuming specific margins [Resnick (1987), Proposition 
5.10]. For the remainder, we will assume Z = (Zi, . . . , Z^)'^ is regularly vary- 
ing with tail index a = l and that Zj,j = l,...,d, have a common marginal 
distribution. Under this assumption, it follows that 

(5) / wi dH{w) = I Wj dH{-w) for j = 2, . . . , d, 

providing some structure to the angular measure H . Furthermore, when 
a = 1 it is particularly useful to choose the Li norm: ||z|| = zi + • • • + z^, 
for which the unit sphere is the simplex Sd^i = {w G C : t^i + • • • + w;^ = 1}. 
With this norm, 1 = /^^ ^ dH{w) = J^^ ^{wi H ^ Wd) dH{w) and, hence, 

■f'sd ^'^j '^Hiyj) =d~^. In practice, the assumption of common marginals 
(or, for that matter, a common tail index) is rarely met. If the data that one 



^Other normalizing sequences are sometimes used; see Resnick (2007), page 174. 
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intends to model arise from a d-dimensional random vector Y for which the 
regularly varying a = l and common marginal assumptions do not hold, we 
presume there exist probability integral transforms Tj such that Tj (Yj ) = Zj 
for j = 1, . . . ,d. This preprocessing of the random variables is common in 
extreme value analyses [e.g., Cooley, Davis and Naveau (2010), Coles and 
Tawn (1991)] and can be viewed analogously to the preprocessing required 
to fit a stationary model to time series or spatial data. 

One recognizes that (3) is the classic relationship characterizing con- 
vergence to a Poisson process, and it is often useful to think in terms of 
point processes when describing multivariate regular variation. From (3), 
the sequence of point processes Nn consisting of point masses located at 
Tii/ttn, i = 1, 2, . . . , n, where Zj are i.i.d. copies of Z converges to a nonhomo- 
geneous Poisson process N with intensity measure A(-) on B(C) [Resnick 
(2007), Section 6.2]. We denote the corresponding intensity function by 
A(dz), where A{A) = J^A{dz). From (4), in terms of polar coordinates, 
A{dr X dw) = r~'^ dr H (dw) . If the angular measure H is differentiable, then 
we refer to the angular density /i(w), and A{dr x dw) = r~'^ h(w) dr dw . 

Multivariate regular variation is a useful way to characterize the joint 
upper tail of a random variable Y for a number of reasons. First, interest 
in extreme behavior is often greatest in cases when the tails are believed 
to be heavy (i.e., having asymptotic behavior like a power function), and 
multivariate regular variation provides a mathematical framework for such 
behavior. Even when the tails do not share a common tail index, marginal 
transformations as described above can be employed to utilize the frame- 
work. Second and more importantly, the angular measure H specifically 
describes the dependence found in the tails. Since our interest is in per- 
forming prediction when the observations are large, it is natural to use a 
framework specifically designed for describing tail dependence. 

We perform prediction by approximating the conditional density. To do 
so, we will rely on a model for the angular measure H, and this model must 
be able to be evaluated for any w S Sd-i- There have been several paramet- 
ric models proposed for H which meet the moment conditions (5). An early 
parametric model was the tilted Dirichlet model of Coles and Tawn (1991). 
Recently, Cooley, Davis and Naveau (2010) and Ballani and Schlather (2011) 
employed a geometric approach to construct new parametric models. A semi- 
parametric model via a mixture of Dirichlet densities was introduced by 
Boldi and Davison (2007). Model fitting is done by Coles and Tawn (1991), 
Cooley, Davis and Naveau (2010) and Ballani and Schlather (2011) via a 
likelihood based on the point process representation for multivariate regular 
variation, while Boldi and Davison (2007) use both a Bayesian MCMC ap- 
proach as well as an EM approach to fit their mixture model. Once fit, any 
of these models could be used for H in the prediction procedure we outline 
in Section 3.2. 
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There is further justification for using the framework of multivariate reg- 
ular variation for modeling extreme values. The multivariate max-stable 
distributions obtained by the classical theory can be characterized by the 
angular measure H. If one assumes that the marginals of the limiting dis- 
tribution are unit Frechet [F{Z < z) = exp(— z~"^)]; that is, the domain of 
attraction of all regularly varying random variables with a = 1 , then 

-d[ V (t) • 

Here the normalizing sequence 6„, = a„,/d to obtain the unit-Frechet marginals 
There have been parametric models developed which give closed-form ex- 
pressions for subfamilies of multivariate max-stable distributions such as 
the asymmetric logistic [Tawn (1990)] and negative logistic [Joe (1990)], 
and these can be used to fit block maxima. Besides being max-stable, these 
distributions are multivariate regularly varying and we later use the logis- 
tic model [Gumbel (I960)] to simulate random vectors whose distribution 
function and limiting angular measure are both known in closed form. 

3. Conditional distribution estimation and prediction for extremes. 

3.1. Previous work in prediction for extremes. There has been a small 
amount of work which has tried to devise methods for performing prediction 
for extremes. Davis and Resnick (1989, 1993) define a distance d between 
the components of a bivariate max-stable random variable, and suggest a 
method of prediction which minimizes the distance between the observed 
component and the predictor. Craigmile et al. (2006) offer a geostatistical 
approach to the problem of determining exceedances in a spatial setting by 
adjusting the loss function of the kriging predictor. 

A recent important advance in the area of approximating a conditional 
distribution for extremes is the work of Wang and Stoev (2011), and we view 
the work in this paper as complementary. Wang and Stoev perform predic- 
tion for the case of max-stable random vectors. Let M^'^"'"^^ = (M^'^\ M^^)-^, 
where M^f^ = (M„,i, . . . , M„,d)^, M^f^ = (M„,i, . . . , M„,p)^, and where 
-|y|-(d+p) assumed to be a max-stable random vector with a known dis- 
tribution. Given data m^'^'' = (m„^i, . . . , m„ ,i)^, Wang and Stoev obtain ap- 
proximate draws from | M.[f ^ = m^'^^ . They accomplish this by sam- 
pling from spectrally-discrete max-stable models which can be represented 
as a max- linear combination of independent random variables. Using a spec- 
trally discrete model would seem to be limiting, as it would imply that the 
corresponding angular measure would only have mass at discrete locations. 
However, it is known that any multivariate max-stable distribution can be 
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approximated arbitrarily well by a max-linear model with a sufficient num- 
ber of elements, and Wang and Stoev claim that their computational method 
can handle max-linear combinations on the order of thousands. Wang and 
Stoev (2011) apply their approach in the spatial setting and the results show 
the discrete approximation performs quite well. 

The method we propose in the next section differs from that of Wang 
and Stoev (2011) in a number of important ways. Perhaps the most impor- 
tant difference is that, rather than performing prediction in a max-stable 
setting which would lend itself to data that are block maxima, our pre- 
diction method is best suited for large observations, that is, the threshold 
exceedance case. Another difference is that, rather than successively draw- 
ing from the conditional distribution as Wang and Stoev do, we provide 
an analytic approximation to the conditional density given the observations 
are sufficiently large. Additionally, rather than relying on an approximation 
which corresponds to a discrete angular measure, our method instead relies 
on a parametric or semi-parametric model for the angular measure. Both 
methods involve nontrivial computation, although our method requires only 
the numerical computation of a one-dimensional integral, whereas Wang 
and Stoev's approach requires computation in fitting an adequate discrete 
approximation to the spectral measure and then in drawing from the condi- 
tional distribution. 

3.2. Approximating the conditional density when observations are large 
via the angular measure. Let Z„d = (Zi, . . . , Z^-i)'^ , and define z_rf analo- 
gously. Working with the Li norm, our goal is to approximate the distribu- 
tion of [Zfi I Z_d = z_d] when ||z_rf|| > and r^ is large. Let us assume that 
H is absolutely continuous with respect to the Lebesgue measure on Sd-i 
and let h denote the corresponding density. 

To approximate the conditional density, we employ the conditional p.d.f. 



where z = {zi, . . . , z^-i, z^)"^ and z(t) = (zi, . . . , z^-i, t)-^. This approxima- 
tion arises from the point process representation for a regularly varying 
random vector as we show below. Consequently, the approximate condi- 
tional distribution utilizes the angular measure H, which characterizes the 
dependence in Z's components when ||Z|| is large. 

The first step in justifying the approximation (7) is to characterize the 
limiting measure A(-) in terms of Cartesian rather than polar coordinates. 

Proposition 1. Assume Z is d-dimensional multivariate regularly vary- 
ing with common marginal distributions, tail index a = l, and angular den- 
sity h. Let Nn denote the sequence of point processes consisting of the point 
masses located at {Zi/an,i = 1,2, ... ,n} , where Zi are i.i.d. copies of Z, and 



(7) 



f: 




j^Ut)\\~i'^^^)h{z{t)/\\z{t)\\)dt 
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let N be the limiting point process as n— t- oo. Denote the intensity measure 
of N by K{-). T/ien A(dz) = ||z||-('^+i)/i(z||z||-i)(iz. 

Proof. The proof is a simple change-of- variables argument. Define the 
transformation p : (0, oo) x Sd~i i— )• C by z := p[r, w) = rw and note that p is 
the inverse of the usual Cartesian-to-polar coordinate transform. To make 
the change of variables, we need |det Jp-i | . It is known that |det Jp\ = r^'^"^) 
[Hogg, McKean and Craig (2005), Example 3.37 (specific for the Li-norm) 
and Song and Gupta (1997), Lemma 1.1 (for the general Lg-norm)]. Thus, 



Let A be an arbitrary set bounded away from {0}, and consider A(yl): 



Thus, A((iz) = ||z||-('^+^)/i(z||z||-i) dz. □ 

Remark. The result is similar to Theorem 1 in Coles and Tawn (1991), 
which allows one to start with a known multivariate max-stable distribution 
with unit Frechet marginals and find its corresponding angular measure. 
Coles and Tawn state "the drawback to the use of theorem 1 is that it 
can be applied only to MEVDs, of which very few have been obtained" 
(page 381). It is important to note that our aim is somewhat the reverse of 
Coles and Tawn: we wish to start with an angular measure, and obtain an 
approximation for the (conditional) density given the observed values are 
large. 

Now, for any tq > and z € M*^ such that ||z|| > tq, let 




z 



(d-i) 







Then, 



'^Z/a„(z,ro) 



P((Z/a„)£ [z,oo),(||Z||/a„)>ro) 
P((||Z||/a„)>ro) 



nF((Z/a„) £ [z,oo)) 
nP((||Z||/a„)>ro) 
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(8) =r,A([z,oo)) because / r-* = r- 

(9) ^rj |Nr«'«)Mz||z|r')<,z seePropoB^tol. 

J [z,oo) 

We wish to speak of fz/ani'^^'''o)^ a joint density of Z/a„ given ||Z||/a,„ > 
tq. Heuristically from (9), we will assume that 

(10) /z/a„(z,ro)^ro||z||-("'+i)/i(z||z|r') for||z||>ro 

as n — 7- oo. More specifically, the convergence would be guaranteed if ^-Fz/a„ {^^ 
ro) converged uniformly to ro||z||~'^'^"*"^)/i(z||z||~^), allowing us to switch the 
order of the limits associated with differentiation and as n — )• oo. See also 
Theorem 6.4 in Resnick (2007) in which regularly varying densities are de- 
scribed. 

Example (Bivariate logistic distribution). Let Z have c.d.f. P(Zi < 

-2^1) -^2 < Z2) = exp[— (z-^ + Z2 for /3 S (0, 1]. Z is then said to have 

a bivariate logistic distribution, which is a known multivariate max-stable 
distribution, and which (more importantly for our purposes) is also regu- 
larly varying with common unit-Prechet marginals F(Zj < z) = exp(— z"^) 
for j = 1,2. Coles and Tawn (1991) show that the angular density of the 
bivariate logistic is given by 



Mw) = ^(i-l)(^^;l^z;2)-l/^-l« 



For a bivariate regularly varying random vector with unit Frechet margins, 
it can be shown that = 2n is a normalizing sequence such that P(||Z|| > 
On) ~ n~^. Now, 

— G [z,oo) ) = F(Zi > 2nzi,Z2 > 2nz2) 
2n J 

= 1 - exp(-(2nzi)-^) - exp(-(2nz2)"^) 
(11) + exp[-((2nzi)"i/^ + {2nz2r'^^f] 

= (2nzi)-i + i2nz2)~^ - ((2nzi)"V/3 + (2nz2)"'/^)^ 
+ o(n"^) 
and, hence, for ||z|| > ro, 

i^z/2n(z, ro) ^ iro(zr' + ^2' " (^1"'^" + ^2'^^)^)- 
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Differentiating tliis, we obtain tlie density 



>--l)(..+.)-3((-il-)- + 

Z\ \ ' I Z2 ^ 



Zl + Z2 



^Zl+ Z2 J \Zi+ Z2 

= ro||z||~3/i(z||z||~"'"), 

wliich agrees witli (10). Similar arguments could be made for logistic mod- 
els of dimension d> 2, but the inclusion/exclusion argument made in (11) 
becomes tedious. 

Now, let us assume n is fixed, but large enough such that /z/a„ (z, tq) ~ 
ro||z||~('^"''^)/i(z||z||~-^). We wish to approximate /z(z,r*), the density of Z 
given that ||Z|| > r^, where r* is large. To obtain an approximation, we do a 
change-of- variables from Z/a^ to Z, which yields 

/z(z,r,)paro||z/a„|r('^+i)/i(z||z|ri)a-'^ 

(12) 

= r,||z|r(''+i)/i(z||z||-i), 

where = a^ro, and thus is large. 

Finally, consider the conditional distribution of [Z^, \ T^-d = z_rf] when 
||z_^|| > r^, and r* is large. Integrating to normalize the conditional density 
yields (7). 

3.3. An approximation example. We investigate our approximation 
method via an example with a known distribution and angular measure. 
The trivariate logistic is a random vector with distribution ¥{Zi < zi,Z2< 
Z2,Z3 < zs) = exp[-(0i"^/^ + z^^/^ + z^^^'^f)] for /3 G (0,1]. The angular 
measure of the trivariate logistic is given by 

Mw)4(i-i 

(13) 

We first investigate the quality of the approximation as ||z_rf|| increases 
and when /3 = 0.3. Since both the distribution and the angular measure are 
known, we can compare the approximated conditional density from (7) to 
the actual conditional density. The first three panels of Figure 2 show how 
the approximation improves as the magnitude of the observations grows. 
The top left panel shows that when the observed values are small {zi = 



(I - 1) {w^w2wsr'/^-' 
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0.0 0.5 1.0 1.5 2 4 6 8 10 

(0.23, 0.24, 0.31) (1.66, 2.06, 1.16) 




50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 



(28.37, 23,57, 18,82) 

Fig. 2. Upper left, upper right and lower left panels show three approximations of the 
conditional density of the third component of a trivariate logistic random variable given 
the first two components. The true conditional density is shown with the dotted line, the 
approximated density with the solid line. Note the different scales for the horizontal axis for 
these three figures. The approximation is poor when the observed values are small (upper 
left), but improves as these values become larger (upper right, lower left). The bottom right 
panel shows the PIT histogram of the largest 1000 (as determined by zi + Z2) of 5000 
total simulations. As the PIT histogram is fiat, it shows that the approximation is good 
for these large observations. Dotted lines indicate the approximate 0.05 and 0.95 quantiles 
for the sampling distributions of each bin under the null hypothesis that the conditional 
distribution is correct. 

0.23, Z2 =0.24), the approximation to the true conditional density is poor. 
However, as the next two panels show, when the observations are sufficiently 
large, the approximation is quite good. 

Next we use a simulation experiment to assess the skill in using (7) for 
approximating the conditional density when the conditioning observations 
are extreme. From the R package evd [Stephenson (2002)], we simulate 5000 
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trivariate logistic random vectors with /3 = 0.3. Let Zj = (Zj^i,Zj^2)-^j,3) , 
i = 1,...,5000, denote the i.i.d. random variables and Zj = {zi^i, Zi^2, Zi^s)"^ 
be the realized values, of which only Zj^i and Zj^2 are initially observed. 
We rank the realizations Zj according to the sum of the observed values 
Zi,i + 2i,2- We then apply our approximation method to the largest 1000 
of these simulations which corresponds to the condition Zj^i + Zi^2 > 8.7. 
As each simulated random vector results in a unique conditional density 
approximation, we assess our method via a probability integral transform 
(PIT) or rank histogram [Gneiting, Balabdaoui and Raftery (2007), Wilks 
(2006), Section 7.7.2]. Let fzi,3\Zi^i,Zi,2i^i,3 I 2^i,ii^j,2) be the approximated 
conditional density given by (7). On simulation i, if the observed values are 
large enough, we let pi = X!'^ /z,,3|z,,i,z,,2 (•s I Zi^i, Zi^2) ds, where Zi^3 is the 
(previously unobserved) value of Zi^^. We then construct a histogram for the 
values Pi. If the approximation is well-calibrated, then the PIT histogram 
should be flat, since there should be equal probability of pi occurring in each 
bin. If the conditional density were correct, the counts in each bin would have 
a binomial (n = 1000, p = 0.1) distribution and approximate quantiles for 
the sampling distribution can be generated under this null hypothesis. The 
bottom right panel indicates that the approximation seems to be quite good 
given that the observations are large, and given that the angular measure is 
known. 

4. Application to nitrogen dioxide air pollution measurements. The ni- 
trogen oxides (NOx) constitute one of the six common air pollutants for 
which the US EPA is required to set air quality standards by the Clean Air 
Act. Of the various nitrogen oxides, nitrogen dioxide (NO2) is the compo- 
nent of "greatest interest" and is used as an indicator of the entire group of 
NOa;.'^ According to the EPA fact sheet [EPA (2010)], short term NO2 expo- 
sures have been shown to cause adverse respiratory effects such as increased 
asthma symptoms. In January 2010, a new 1-hour NO2 standard was set at 
100 parts per billion (ppb) to protect against adverse health effects due to 
short-term exposure to NO2. 

Under the guidelines set by the EPA's Ambient Air Monitoring Pro- 
gram,*^ state and local agencies are charged with establishing and main- 
taining a network of air pollution monitoring stations. The EPA has made 
available data from these stations. Using an online tool,^ we collected data 
from five stations located in Washington DC and nearby Virginia which 
were all active during the entire period from 1995-2010. The stations were 
Alexandria (site ID: 51-510-0009), McMihan (11-001-0043), River Terrace 



'^http : // www . epa . gov/ air/nitrogenoxides/ . 
*http : //epa. gov/airquality/qa/monprog .html . 
'http : / / www . epa . gov/ airdata/ . 
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Fig. 3. Time series plot of the retained measurements at the Arlington station. 

(11-001-0041), Takoma School (11-001-0025) and Arlington (51-013-0020). 
The locations of these stations are shown in Figure 1. 

Of course, air pollution measurements are of most interest when levels 
are believed to be high, and monitors only record pollutant levels at specific 
locations. We test our prediction method when observations are large on 
these five Washington-area stations. We aim to predict the NO2 level at the 
Arlington station, given the NO2 measurement at the other four stations. 
We choose NO2 because all five stations measured this pollutant, and NO2 
appears to have the heaviest tail of the pollutants we examined. 

From each of the five stations, we extract the daily maximum NO2 mea- 
surement; all of the stations have over 5000 daily NO2 measurements recorded 
between Jan 1, 1995-Jan 31, 2010 which meet EPA's daily summary quality 
requirements. Prom these, we keep only days in which all five stations have 
measurements, resulting in 4497 daily measurements. Finally, because the 
data are truncated to the nearest ppb, the empirical c.d.f. appears quite dis- 
crete. Thus, we add a uniform random variable on the interval [—0.5,0.5] to 
the data so that they behave more like the underlying continuous variable.^'' 

Figure 3 shows the time series of the retained measurements at the Arling- 
ton station. Unlike other pollutants such as ground-level ozone, there does 
not appear to be a strong seasonal effect for N02. Although a very weak 
seasonal signal is detectable for a moving-average smoothed time series, this 
signal is hard to discern from a smoothed periodogram. It also appears that 
NO2 levels have decreased at this site over the study period, and the other 
stations show a similar, but weaker, trend. Checking for serial dependence 
in the data, we find the sample autocorrelation function of the deseasonal- 
ized data shows a highly significant correlation only at lag 1 [/5(1) = 0.35]. 
Figure 4 shows scatter plots of the measurements at the Arlington station 
versus the four stations used for prediction. The strong positive correlation 



Data available at http://www.stat.colostate.edu/~cooleyd/DataAndCode/ 
PredExtremes/. 



CONDITIONAL DENSITY APPROXIMATION FOR EXTREMES 



15 



^nl Un^ 



ril I In, 



rii Un. 







20 40 60 80 100 120 20 40 60 
alx mc 



100 120 20 40 60 80 100 120 20 40 60 80 100 120 1000 



Fig. 4. Scatterplots of the measurements at the Arlington (arl) station verses the other 
four stations. 



between NO2 measurements shown in Figure 4 is indicative of that found 
among all pairs, with the strongest sample correlation being 0.83 between 
Arlington and Alexandria and the weakest being 0.66 between Alexandria 
and McMillan. Figure 4 also shows that largest values can be coincident 
between stations. In our analysis that follows, we assume that the depen- 
dence in the upper tail of the joint distribution of NO2 measurements is not 
affected by the weak seasonality or the trend found in the data. We checked 
the trend assumption by fitting angular measure models to the first half 
and second half of the multivariate time series separately and found simi- 
lar parameter estimates. We also ignore the serial dependence in the data, 
predicting the Arlington station's measurement using only the other four 
stations' measurements from that day. 

Let Yt = (5^,1, ■ • • , ^,5)"^ represent the random vector of measurements on 
day t at the five locations. Our first task is to estimate the angular measure 
which describes the tail dependence of the NO2 measurements at these loca- 
tions. As formulated in Section 2.2, angular measure models assume a com- 
mon marginal distribution with tail index a = 1 . To obtain common marginal 
distributions, we use the following procedure. Mean residual life plots [Coles 
(2001), Section 4.3.1] are used to select the 0.93 quantile as an appropri- 
ate threshold above which each location's data were approximately Pareto- 
distributed. At each location, a generalized Pareto distribution (GPD) is 
fit to the data above the threshold via maximum likelihood. Letting Fj be 
the estimated marginal distribution function formed by using the empirical 
distribution below the threshold and the fitted GPD above (appropriately 
weighted by the observed exceedance probability), each location's data are 
transformed to have a unit Frechet distribution: Ztj = {—\og{Fj{Ytj))~^ . 
Table 1 summarizes the marginal tail estimates. 

The data are divided into a training set and test set. The training set, 
consisting of two-thirds of the available data (ntrain = 2998), is used to fit 
a five-dimensional angular measure model. The test set, consisting of the 
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Table 1 

Threshold and GPD estimates (and standard errors) of the 
tail for the five Washington DC area locations. The GPD is 
parametrized P{Y > y\Y > u) = {1 + C^)^^''*- 
If ^> 0, the tail index q = 1/^ 



Site # 


Location 


u 




i 


1 


alx 


59.44 


7.78 (0.64) 


0.07 (0.06) 


2 


mc 


56.80 


8.29 (0.70) 


0.05 (0.06) 


3 


rt 


59.69 


8.96 (0.78) 


0.10 (0.07) 


4 


ts 


55.51 


6.67 (0.55) 


0.02 (0.06) 


5 


arl 


57.97 


7.56 (0.62) 


0.07 (0.06) 



other one-third (ntest = 1499), is used to test the prediction method. Because 
of the decreasing trend, we construct the test set by extracting every third 
observation so that both the training and test sets would reflect the behavior 
over the entire study period. 

The pairwise beta model [Cooley, Davis and Naveau (2010)] is an angular 
measure model for dimension d> 2 with parameters which help to control 
the amount of dependence between each pair of elements in the random 
vector. We fit the pairwise beta via maximum likelihood, and the likelihood 
arises by assuming that the point process relationship implied by (4) is exact 
for large observations [Coles and Tawn (1991), Cooley, Davis and Naveau 
(2010)]. The largest observations were determined by ||zt||, that is, the radial 
component of the transformed data, and the largest 210 observations (0.93 
quantile) were used to fit the model. 

The pairwise beta has angular density given by 

(14) h{w;j,p)=Kd{j) Yl %fe(w;7,,5j-fc) for < < 1 

^<j<k<d 

where 

%fc(w;7,/3,-,fc) = {wj + wkf^-\l - {w,+wk)r^^~'^"'^' 



X 



r(2/3,-fc)/ Wj Y-"-^/ Wk ^ 



and 



^'^{/3j,k) \Wj + WkJ \Wj+Wk 

2{d-3)l r(7d+l) 
d(d^T)7^ f(2^Tl)f(7(d^^ 



is a normalizing constant. The estimated parameters for the fitted model 
are given in Table 2. In the pairwise beta model the magnitude of the Pij 
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Table 2 

Parameter estimates (and standard errors) for the pairmse beta angular measure model 
fit to the Washington DC NO2 data 



7 


/3l,2 


01,3 


01,4. 


01,5 


02,3 


02,4 


02,5 


03,4. 


03,5 


04,5 


0.37 


0.51 


0.64 


0.56 


6.11 


0.76 


1.64 


0.96 


0.56 


0.98 


1.01 


(0.03) 


(0.18) 


(0.28) 


(0.19) 


(2.59) 


(0.44) 


(1.08) 


(0.51) 


(0.20) 


(0.51) 


(0.61) 



parameter is related to the level of dependence between the ith and j com- 
ponents; the fact that /3i^5 is the largest indicates that the Alexandria and 
Arlington stations show the strongest tail dependence. 

For the test set, we assume that the Arlington station is not observed, and 
aim to approximate the conditional density of this station's NO2 measure- 
ment given the measurements at the other four stations. Since our method 
is only valid when the observations are large, we perform prediction for 
the 105 test-set observations with the largest values of ||zj^_5||. That is, we 
threshold at the empirical 0.93 quantile of the radial component (sum) of the 
transformed data at the observed locations. Using the fitted pairwise beta 
angular measure, the conditional density fzt 5\Zt -5{^t,5 \ 24,-5) was approx- 
imated using the procedure described in Section 3.2 for each of these top 
105 observations.^^ The integration in the denominator of (7) was approx- 
imated using Simpson's Rule. These were then back-transformed to obtain 
the conditional densities on the original scale gyt^lYt -.5(yt,5 I Yt.-s)- Three 
of the approximated conditional densities can be found in the top row of 
Figure 5. 

We compare our prediction method to two other approaches: best linear 
unbiased prediction (kriging) and indicator kriging [Cressie (1993), Schaben- 
berger and Gotway (2005)]. Kriging is a prediction method that utilizes only 
mean and covariance information. At its most fundamental level, kriging 
does not make a distributional assumption, it provides a point prediction 
which corresponds to the best linear unbiased predictor in mean-square pre- 
diction error (MSPE), and additionally provides an estimate of the MSPE. 
To obtain confidence intervals, typically a Gaussian assumption is made. 
Furthermore, if one assumes the data arise from a Gaussian process, then 
the kriging estimate and MSPE correspond to the conditional expectation 
and variance. Since our method generates a conditional distribution, we will 
compare it to the conditional distribution provided by kriging under a Gaus- 
sian assumption. 

Our kriging procedure is in parallel to the angular measure procedure 
above. The training set is used to formulate a model; here all 2298 obser- 



^^The code used to produce the results is available at http://www.stat. 
colostate . edu/~cooleyd/DataAndCode/PredExtremes/PredExtremesFiles . zip. 
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Fig. 5. Comparison of the approximated conditional densities at the Arlington site given 
the measurements at the other sites: ^iy, _5 (yt,5|yt,-5) for three different days with 
high measurements. Top figure in each column is the approximated conditional density via 
the angular measure, middle figure is from simple kriging, and the bottom figure is from 
indicator kriging. Below each figure is the vector of actual measurements at all five sites. 
The fifth element corresponds to the Arlington site which we are trying to predict and 
which IS plotted with a dot in the figures. The dotted line in the upper left corresponds to 
the marginal distribution for the Arlington site. 

vations are used to estimate the mean NO 2 levels at all five locations as 
well as to estimate the covariance matrix between the measurements at the 
locations. It is important to note that no spatial covariance function is fit, as 
our training set allows us to estimate the covariance matrix directly. Treat- 
ing the mean and covariance as known, we then use simple kriging [Cressie 
(1993)] to obtain a point prediction at the Arlington location given the mea- 
surements at the other locations for the same 105 large observations in the 
test set. The MSPE is calculated from the estimated covariances, we use it 
to obtain the estimated conditional density at the Arlington location given 
the other measurements under a Gaussian assumption. 

We also compare to indicator kriging [Cressie (1993), Schabenberger and 
Gotway (2005)] which is a nonparametric version of kriging designed to 
provide estimates of P(ld > u\Yi, . . . ,y^_i) for a given threshold u. When 
performing indicator kriging, one first needs an estimate of the covariance 
matrix of the random variables corresponding to the indicators 1(1^- > u) 
for all the j locations. At time t, given observations yt^i, ■ ■ ■ ,yt,d-i, these 
are converted into indicators, ^{ytj > u) and ordinary kriging is used to 
estimate E[I{Yt^d > u) \ I(yt,i > n), . . . ,I(yt,d_i > u)] = ¥{Yt^d > u \ I(yt,i > 
u), . . . ,I{yt,d-i > u)). Repeating the analysis for various values of u allows 
one to estimate a conditional distribution, although there is no guarantee 
that the estimate will be monotonic. 
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Our indicator kriging analysis is again parallel to the angular measure 
and simple kriging analyses. We let u vary from 10-105 ppm with a step 
size of 0.25 ppm which covers the range of observations. The training set is 
used to estimate the covariance matrix of indicators at the various levels of 
u, and then indicator kriging is performed on each of the sets of observations 
in the test set. To guarantee that the conditional distribution is monotonic, 
we then perform a monotone quadratic smoothing spline regression [Meyer 
(2008)] on the estimates ¥{Yt^d > u \ l{yt,i >u),.. ■ ,l{yt,d-i > u)) for all the 
values of u. Densities are obtained by differentiating the smoothing spline. 

Conditional densities obtained by the angular measure method, simple 
kriging and indicator kriging are shown in Figure 5 for three different days' 
data. In these three figures the conditional density approximated via the an- 
gular measure is less concentrated than the conditional density from simple 
kriging, and that proves to be the case in general. The angular measure can 
also be somewhat skewed or slightly bimodal depending on the combination 
of the observed measurements. Although indicator kriging is performed for 
each pollution level n, the conditional density as approximated by indicator 
kriging is very rough, as there are only four locations. 

We evaluate the performance of the three approaches using various meth- 
ods. All comparisons are done at the original scale. To test the overall fit 
of the approximated conditional density, we again use the PIT histogram. 
Figure 6 shows the PIT histograms for all three methods. The PIT his- 
togram for the angular measure method is relatively flat with perhaps some 
indication that the model is overestimating the probability in the lower tail, 
resulting in too few observations falling in the first decile of the approxi- 



Approx. w/ Angular Measure Simple Kriging Indicator Kriging 




I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

quantile quantile quantile 



Fig. 6. PIT histograms for the angular measure approach (left), simple kriging (cen- 
ter) and indicator kriging (right). Perfect estimation of the conditional density would be 
indicated by a flat histogram. Error bars are obtained for each decile from a binomial 
distribution {ri— 105,p = 0.10). 



20 



D. COOLEY, R. A. DAVIS AND P. NAVEAU 



mated conditional density. This could be due to the angular density model: 
the pairwise beta model fit to the data is certainly not the true model for 
the angular density which is unknown. It could also be due to threshold 
choice, although the parameter estimates of the pairwise beta model did 
not appear to be sensitive to the threshold. The kriging estimate with the 
Gaussian assumption shows a classic u-shape associated with underdisper- 
sion [Wilks (2006), Section 7.7.2]. This model underestimates the probability 
of the observation occurring in the lower tail and also the upper tail. Indica- 
tor kriging also appears to underestimate the upper tail of the distribution, 
resulting in too many observations appearing in the highest decile of the 
approximated conditional distribution. Using the terminology of Gneiting, 
Balabdaoui and Raftery (2007), the PIT histograms indicate that the angu- 
lar measure method is better (probabilistically) calibrated, particularly in 
the upper quantiles of the predictive distribution. 

Another performance evaluation is to see how well each method estimates 
a quantile, and, particularly, a high quantile. For instance, regulators might 
wish to have an accurate assessment of a high quantile of an unmonitored 
location given large observations nearby. Such an estimate could be used as 
a probabilistic upper bound, that is, ofhcials could state that they were 95% 
confident that the level at the unmonitored location was below a reported 
level. For each of the 105 large observations, we use the approximated con- 
ditional density from all three methods to estimate the 0.99, 0.95, 0.90, 0.75 
and 0.50 quantiles. We examine coverage by calculating the proportion of 
actual observations that fell beneath these quantiles and also calculate each 
method's quantile verification score (QVS) [Gneiting and Raftery (2007)]. 

Let g(^\\Y^_^{s I yt-b) and G'^^|y^__.(s | yt-s) be the predictive density 

and cumulative distribution function for Yj^s | ^t-b = yt,-5i where m de- 
notes the angular measure method (m = 1), kriging (m = 2) or indicator 
kriging [m = 3). We have parameterized the QVS as in Friederichs and 
Hense (2007), 

105 

Qy5(-)=j;p.(y,5-^j::?)), 

t=l 

where 4t? = G^^%t and pr{u) = tuI{u > 0) + (r - l)ul{u < 0). 

A lower QVS score indicates better skill, but the scale of the QVS score 
depends on the quantile to which it is being applied. The QVS is a proper 
scoring rule, meaning that it is minimized if the predictive distribution corre- 
sponds to the "true" distribution. Both the coverage and QVS results for the 
tested quantiles are shown in Table 3, as well as the sampling error assuming 
independent Bernoulli trials with p equal to the given quantile. The angular 
measure method does a superior job of estimating the high quantiles (0.99, 
0.95 and 0.90) when the observations are large, whereas both simple kriging 
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Table 3 

Gives the skill of the different methods for assessing high guantiles. Coverage (Cvg) 
column reports the proportion of the observations at the Arlington location that fell 
beneath the guantile as calculated from the estimated conditional density and QVS 
column reports the quantile verification score (lower is better) 

Quantile 

0.99 0.95 0.90 0.75 0.50 



Cvg QVS Cvg QVS Cvg QVS Cvg QVS Cvg QVS 

Angular 0.97 40.97 0.93 134.77 0.88 225.68 0.70 398.97 0.44 502.51 

measure 

Simple 0.92 65.80 0.83 170.04 0.81 246.26 0.65 378.27 0.54 444.84 
kriging 

Indicator 0.90 67.80 0.86 153.41 0.83 238.63 0.73 377.20 0.49 452.68 
kriging 

Sampling (0.01) - (0.02) - (0.03) - (0.04) - (0.05) 
error 



and indicator kriging underestimate these high quantiles. The angular mea- 
sure method seems to be outperformed by indicator kriging for the 0.75 and 
0.50 quantiles, although its coverage rates fall well within acceptable ranges 
when sampling error is accounted for. 

Each method's conditional density is essentially a probabilistic forecast, 
and scoring rules have been developed which provide an overall measure of 
the quality of probabilistic forecasts [Gneiting and Raftery (2007)]. We as- 
sess the methods using two different proper scoring rules: the logarithmic 
score and the continuous rank probability score (CRPS). The logarithmic 

score for the prediction at time t is given by —\og{g^l^-y^ Syt,5 \ yt,-5)), 
where yt^5 is the actual observation at the Arlington station and g^j^^^.^ ^ is 

the estimated conditional density via the angular measure approach (m = 1), 
kriging (m = 2) and indicator kriging (m = 3) . The logarithmic score has an 
information-theoretic basis and corresponds to the Kullback-Leibler diver- 
gence between the predictive density g^^^^^^ ^ [s \ and the Kronecker 

delta function ds,yt,5- W^e assess the methods by the mean of the logarithmic 
scores 

^ 105 

— 5^ - log(<7g|Y,_, I yt,-5)) 

t=l 

over assumed independent realizations yt, t = 1,...,105. The mean loga- 
rithmic score is 3.93 for the angular measure approach, 4.24 for kriging, and 
infinity for indicator kriging, as 8 of the 105 of the observations yt^5 fall out- 
side the support of the predictive distribution. Since a lower score is better, 
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the angular measure method outperforms kriging and indicator kriging by 
this performance measure. 

The logarithmic score has been criticized, as it is not "robust" to cases 
when observations fall outside the support of the distribution as in the in- 
dicator kriging case above. A popular alternative is the CRPS. For our ex- 
ample, the CRPS for a particular day t is given by 

/oo 
{G^^l\^^Js\yt,^,)-I{s>yt,5}fds 
-oo 

and can be understood as a nonlinear function of the area between each 
method's predictive c.d.f. G^y^I^^^ ^{^ I yt.-s) ^'^d the heavyside function 
associated with the realized value yt^5. The CRPS score rewards appropriate 
centering of the predictive distribution and narrowness of the predictive 
distribution otherwise known as "sharpness." We assess the three methods 
by the mean of the CRPS scores for t = l,..., 105. Given the PIT histograms 
and logarithmic scores, it is perhaps surprising that the mean CRPS scores 
associated with the angular measure method, kriging and indicator kriging 
are 6.83, 6.36 and 6.21, respectively, indicating that by this performance 
measure, the angular measure method is performing worst. While all three 
methods produce predictive densities that are centered (i.e., the realized 
values exceed the predictive density's median about half of the time), the 
predictive densities from kriging and indicatior kriging are sharper than 
(but not as well calibrated as) the density produced by the angular measure 
method (Figure 5). 

The CRPS score can be written as an integral with respect to a threshold 
s as in (15) or, equivalently, in terms of the quantile function G^l^^^ 5~^^P^ 

and integrated with respect to p G (0, 1) [Gneiting and Ranjan (2011)]. Fur- 
ther, the overall mean CRPS score can be decomposed into a mean CRPS 
score at each p, then integrated with respect to p. Gneiting and Ranjan 
(2011) suggest plotting the quantile score verses p as a diagnostic tool. When 
done for the three forecasts. Figure 7 shows that the angular measure method 
outperforms the other methods for high quantiles, but both kriging and in- 
dicator kriging outperform the angular measure method for quantiles near 
0.5, likely due to the increased sharpness of these methods. Since the quan- 
tile scores are naturally larger near values of 0.5, the overall mean CRPS 
scores for kriging and indicator kriging end up lower. Gneiting and Ranjan 
(2011) also discuss a quantile weighted CRPS score 

l'myt,6 < -^')(4^iY,_r'(?') -y*.5)^(p)dp, 

where one can choose the weight function v{q) to emphasize quantiles of 
interest. Letting v{q) = I{v{q) > 0.85}, the mean weighted CRPS scores for 
the angular measure method, kriging and indicator kriging are 0.50, 0.57 
and 0.55. 
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Fig. 7. Mean GRPS score for the three methods decomposed by quantile p as in Gneiting 
and Ranjan (2011). Solid line is the angular measure method, dashed is kriging, dotted 
is ordinary kriging. The angular measure method performs best for high quantiles, but 
performs less well for the middle quantiles. 

5. Summary and discussion. In this work we obtain an approximation 
for the distribution of a component of a regularly varying random vector 
given that the observed components are large. We apply the approximation 
to estimate the conditional distribution of an air pollutant given nearby 
measurements that are large. Results show that our method outperforms 
traditional spatial prediction methods at capturing the conditional distribu- 
tion of the random variable when the observations are large. PIT histograms 
show that our method is better calibrated, and the method proves to be much 
better suited for obtaining probabilistic upper bounds of the pollutant level. 
For example, the estimated 95% quantiles provided by kriging and indicator 
kriging were too low and the actual exceedance rates were 17% and 14%, re- 
spectively. The exceedance rate of the angular measure method's estimated 
95% quantile was 7% and was within sampling error of 5%. 

We believe that this is the first work to perform prediction using ex- 
tremes techniques in a threshold exceedance setting. The classic theory that 
leads to max-stable distributions and processes is quite elegant and forms 
the foundation for all of extreme value theory. Statistical practice utilizing 
multivariate max-stability generally requires one to obtain component-wise 
block maximum data, and such data can be viewed as "artificial" in the 
sense that one models data vectors that are likely to have never occurred, 
since the block maxima are likely to occur at different times. It seems nat- 
ural to try and attempt to describe large concurrent observations, and the 
framework of multivariate regular variation allows this. 
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Our method relies on an adequate angular measure model. There has 
been some renewed interest of late in constructing flexible models which 
meet the moment conditions (5) [Cooley, Davis and Naveau (2010), Ballani 
and Schlather (2011), Boldi and Davison (2007)]. However, no model with 
a finite parameterization can completely describe the possible angular mea- 
sures, and the existing models may not prove to adequately model every 
multivariate data set. These models become unwieldy as the dimension in- 
creases beyond moderate levels (d~ 5). Certainly there remains a need for 
flexible multivariate extremes models. 

Although we apply our method to multivariate time series data, we do not 
make use of any temporal dependence in the data. Our method proceeds as if 
the sequence of multivariate random vectors are i.i.d. One could extend the 
method by allowing the marginal distributions to vary in time; such extreme 
value models are regularly used [e.g., Beirlant et al. (2004), chapter 7] and 
might be required if, for instance, the seasonality of this data had been more 
influential. 
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